Source Retrieval Plagiarism Detection based on Weighted Noun phrase and Key phrase Extraction
نویسندگان
چکیده
This paper describes an approach for source retrieval task of PAN 2015 competition. We apply two methods to extract important terms, namely weighted noun phrases and keyword phrases which are extracted from long sentences in terms of word count. Queries are constructed from top marked sentences. The prepared system tries to gather a complete dataset of downloaded sources and employ it in query filtering operations. The ChatNoir search API is used for submitted queries. Each query is split into two sub-queries and the system extract one snippet for each of sub-queries and exploits them in downloading operation. The evaluation results show high scores for three measures: recall, total queries number and no detection.
منابع مشابه
Word Formation Approach to Noun Phrase Analysis for Thai
Noun phrase analysis is one of the most important components in Natural Language Processing (NLP) applications, such as information retrieval, extraction and categorization. For Thai, noun phrase analysis has unique problems, i.e., noun phrase boundary identification, noun phrase decomposition and its relation extraction, and core noun detection. Statistical and rule based Word formation is, th...
متن کاملInvestigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملA Noun Phrase Parser of English
A noun phrase parser is useful for several purposes, e.g. for index term generation in an information retrieval application; for the extraction of collocational knowledge from large corpora for the development of computational tools for language analysis; for providing a shallow but accurately analysed input for a more ambitious parsing system; for the discovery of translation units, and so on....
متن کاملNoun-Phrase Analysis in Unrestricted Text for Information Retrieval
Information retrieval is an important application area of natural-language processing where one encounters the genuine challenge of processing large quantities of unrestricted natural-language text. This paper reports on the application of a few simple, yet robust and efficient nounphrase analysis techniques to create better indexing phrases for information retrieval. In particular, we describe...
متن کاملExtracting Conceptual Terms from Medical Documents
Automated biomedical concept recognition is important for biomedical document retrieval and text mining research. In this paper, we describe a two-step concept extraction technique for documents in biomedical domain. Step one includes noun phrase extraction, which can automatically extract noun phrases from medical documents. Extracted noun phrases are used as concept term candidates which beco...
متن کامل